Skip to content

feat(node): replication enforcement (Phase 2) for #18#34

Open
beardthelion wants to merge 19 commits into
Gitlawb:mainfrom
beardthelion:feat/phase2-replication-enforcement
Open

feat(node): replication enforcement (Phase 2) for #18#34
beardthelion wants to merge 19 commits into
Gitlawb:mainfrom
beardthelion:feat/phase2-replication-enforcement

Conversation

@beardthelion

@beardthelion beardthelion commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Phase 2 of path-scoped visibility (#18): stop withheld content from leaving the origin node through replication, and stop fully-private repos from being announced to the network. Phase 1 (#25) gates the git read path and Phase 3 (#28) withholds blobs from served packs, but after a push three paths still copied objects off the node ignoring visibility: local IPFS pinning, Pinata pinning, and the gossip/peer-notify/Arweave announcements.

The whole thing reduces to one decision computed once per push in git_receive_pack: can an anonymous caller read the repo root, and which blob OIDs are denied to the public. A withheld: Option<HashSet<String>> drives both pin sites (None means the repo is private, so nothing replicates, not even commit and tree objects), and an announce bool gates the network-facing announcements.

What changes:

  • IPFS and Pinata pinning skip the withheld blob OIDs (via a small pure replicable_objects filter). For a private repo they pin nothing at all, so file names in tree objects and history in commit objects no longer reach public IPFS.
  • Gossip ref-update publish, the HTTP peer-notify fallback, and Arweave anchoring are suppressed for repos the public cannot read. Mode B repos (public with a private subtree) still announce, since their commit and tree SHAs are public.
  • Fail closed: if visibility can't be determined, the push replicates nothing.
  • The in-process GraphQL subscription broadcast and the local branch->CID write are left alone; they are owner-facing/local, not network leaks.

Deferred on purpose, each cheap to add later off the same seam: peer partial-mirrors (peers currently fail closed on repos with withheld content), UCAN-delegated reader sets, and encrypted-at-rest replication of private blobs.

Depends on #28: withheld_blob_oids lives on that branch. This PR is stacked on it, so until #28 merges the diff here will also show #28's commits. Rebase onto main once #28 lands.

Test plan

  • cargo test -p gitlawb-node (100 pass), cargo clippy --all-targets -D warnings clean, cargo fmt --check clean
  • Unit coverage: replicable_objects filter, anonymous-caller contract of withheld_blob_oids, and the announce gate across public / legacy-private / mode A / mode B
  • Manual: push to a node with a mode B /secret/** rule, confirm the secret blob is absent from IPFS/Pinata while public files and the commit/tree are present
  • Manual: push to a fully-private repo, confirm no objects pinned and no gossip/peer-notify/Arweave anchor

Summary by CodeRabbit

  • New Features

    • Selective content withholding enforced for git clone/fetch and push flows; replication and external pinning now respect visibility.
    • Pack generation can exclude restricted blob content so clients do not receive withheld objects.
  • Tests

    • Comprehensive unit and end-to-end tests validating withholding, filtered pack construction, and announce gating.
  • Chores

    • Updated .gitignore to ignore generated docs directory.

…al clone

upload_pack_excluding emitted a v2 packfile section, but info_refs
advertises v0, so real clients negotiated v0 and rejected the response
with 'expected ACK/NAK, got packfile'. Frame the v0 stateless-rpc shape
instead (NAK, then the pack via side-band-64k when offered).

Add an end-to-end test that stands up info_refs + upload_pack_excluding
and runs a real git partial clone, asserting the withheld blob's bytes
never reach the client while its tree entry and SHA stay visible. A stock
full clone cannot consume the pack (it is not closed under reachability,
so fetch fails the connectivity check); a partial clone is required.
…tion choice

Add a real-git test that partial-clones, pushes a new commit server-side,
then fetches: the new object arrives and the withheld blob stays absent.
This pins down that ignoring have/want negotiation (always sending a
self-contained pack of all refs minus withheld, with NAK) is correct for
both clone and fetch; the only cost is a fetch re-sends the full object
set. Refactor the real-git tests onto a shared server harness and document
the negotiation decision in code and in the plan's follow-ups.
Move the two blocking git shell-outs in the filtered upload-pack path off
the async worker thread, matching the tokio::process / spawn_blocking usage
already in this file: build_filtered_pack (rev-list + pack-objects) and
withheld_blob_oids (per-ref ls-tree) now run inside spawn_blocking so a large
repo cannot stall the tokio runtime. Behavior is unchanged.

Also fix the Task 0 findings block in the Phase 3 plan: it still recorded v2
packfile framing, which is the exact path that failed against a real client
and was corrected to v0. The block now documents the shipped v0 contract.
Drop a stray trailing code fence flagged by markdownlint (MD040).

The speculative ls-tree timeout and the public/no-rules fast-path from the
review are intentionally left out: the timeout guards against adversarial
repos we do not yet host, and the fast-path is a micro-optimization not worth
the extra branch right now.
kevincodex1 asked to keep the superpowers planning docs out of the repo. The
Phase 3 plan was scaffolding for this change, not something the project needs
to carry. Removing it leaves only the code and tests in the PR.
@coderabbitai

coderabbitai Bot commented Jun 8, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 7fee617a-0ad1-467d-9f74-3908893536d6

📥 Commits

Reviewing files that changed from the base of the PR and between 949d131 and 083293d.

📒 Files selected for processing (1)
  • crates/gitlawb-node/src/api/repos.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/gitlawb-node/src/api/repos.rs

📝 Walkthrough

Walkthrough

Compute withheld blob OIDs from path visibility rules, serve filtered upload-pack responses excluding those OIDs, integrate withholding into git-upload-pack and git-receive-pack handlers, and gate IPFS/Pinata pinning plus P2P/HTTP/Arweave dissemination on an announce decision. Tests validate filtering and end-to-end clone/fetch behavior.

Changes

Visibility-aware blob withholding for Git read and replication

Layer / File(s) Summary
Visibility pack core logic
crates/gitlawb-node/src/git/visibility_pack.rs, crates/gitlawb-node/src/git/mod.rs
New module computes withheld blob OIDs by evaluating visibility rules per (blob, path) pair across all refs. Exports withheld_blob_oids and replicable_objects. Includes unit tests covering anonymous/reader/owner scenarios.
Smart HTTP pack filtering and serving
crates/gitlawb-node/src/git/smart_http.rs
Adds build_filtered_pack to enumerate reachable objects and build packs excluding withheld OIDs, and upload_pack_excluding handler to serve filtered packs with git protocol v0 framing and optional side-band-64k. Includes helpers, unit tests, and end-to-end smart-HTTP clone/fetch tests.
Upload-pack endpoint integration
crates/gitlawb-node/src/api/repos.rs (upload-pack handler)
Integrates visibility_pack::withheld_blob_oids into git_upload_pack to route to upload_pack or upload_pack_excluding based on withheld set emptiness; adjusts protocol error mapping and documents that subtree (mode B) rules do not affect info-refs advertisement.
Receive-pack Phase 2 replication control
crates/gitlawb-node/src/api/repos.rs (receive-pack, pinning, dissemination handlers)
Derives announce per push from visibility rules; when announce=false skips pinning/dissemination and sets withheld=None; when announce=true computes withheld and pins accordingly. Gates P2P ref publishing, HTTP peer sync, and Arweave anchoring on announce.
Pinning API withheld set support
crates/gitlawb-node/src/ipfs_pin.rs, crates/gitlawb-node/src/pinata.rs
Updates IPFS and Pinata pin_new_objects signatures to accept withheld: &HashSet<String> and filter enumerated object lists via replicable_objects(withheld) before pinning.
Validation tests and configuration
crates/gitlawb-node/src/visibility.rs, .gitignore
Adds unit test validating announce-gate logic for whole-repo / readability and updates .gitignore to exclude docs/superpowers/.

Sequence Diagram — upload-pack / filtered pack flow

sequenceDiagram
  participant Client
  participant GitUploadPack
  participant VisibilityPack
  participant SmartHTTP
  Client->>GitUploadPack: POST /git-upload-pack (negotiation)
  GitUploadPack->>VisibilityPack: withheld_blob_oids(repo, rules, caller)
  VisibilityPack-->>GitUploadPack: withheld set
  alt withheld set empty
    GitUploadPack->>SmartHTTP: upload_pack()
  else withheld set non-empty
    GitUploadPack->>SmartHTTP: upload_pack_excluding(withheld)
  end
  SmartHTTP->>SmartHTTP: build_filtered_pack(withheld)
  SmartHTTP-->>Client: framed pack (withheld blobs excluded)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • Gitlawb/node#25: Builds on Phase 1 visibility enforcement by integrating visibility decisions into upload/receive pack behavior and replication gating.

Suggested reviewers

  • kevincodex1

Poem

🐰 In hidden blobs the secrets hide away,
Packs are pruned so prying eyes can't play.
Rules decide what gets to roam or stay,
Pinning and announce keep replication at bay.
A hopping cheer — safe repos on display!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main change: replication enforcement (Phase 2) for visibility control, which aligns with the core objective of preventing withheld content from leaving the origin node through replication paths.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/gitlawb-node/src/api/repos.rs`:
- Around line 629-644: The match arm currently calls
crate::git::visibility_pack::withheld_blob_oids(...) directly on the async
worker (using disk_path, rules, record.is_public, &record.owner_did), which must
be moved into a blocking task; replace the direct call with
tokio::task::spawn_blocking(||
crate::git::visibility_pack::withheld_blob_oids(...)).await handling
(propagate/map the Result->Option the same way and keep the tracing::warn! on
errors) so the git ls-tree subprocess runs off the async runtime thread.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: a00a4311-c564-4086-b45f-866546839dd1

📥 Commits

Reviewing files that changed from the base of the PR and between 6abaf1d and 949d131.

📒 Files selected for processing (8)
  • .gitignore
  • crates/gitlawb-node/src/api/repos.rs
  • crates/gitlawb-node/src/git/mod.rs
  • crates/gitlawb-node/src/git/smart_http.rs
  • crates/gitlawb-node/src/git/visibility_pack.rs
  • crates/gitlawb-node/src/ipfs_pin.rs
  • crates/gitlawb-node/src/pinata.rs
  • crates/gitlawb-node/src/visibility.rs

Comment thread crates/gitlawb-node/src/api/repos.rs
The receive-pack replication chokepoint called withheld_blob_oids
directly on the tokio worker, where its blocking git ls-tree walk can
stall the runtime for repos with many refs. Wrap it in spawn_blocking
to match the upload-pack serve path.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant